INVESTIGATION 



Satellite DNA-Like Elements Associated With Genes 
Within Euchromatin of the Beetle Tribolium 
castaneum 

Josip Brajkovic,* Isidoro Feliciello,*' T Branka Bruvo-Madaric,* and Durdica Ugarkovic*' 1 

*Department of Molecular Biology, Ruder Boskovic Institute, Bijenicka 54, HR-10000 Zagreb, Croatia, and t Dipartimento 
di Medicina Clinica e Sperimentale, Universita degli Studi di Napoli Federico II, via Pansini 5, 180131, Napoli, Italy 



ABSTRACT In the red flour beetle Tribolium castaneum the major TCAST satellite DNA accounts for 35% 
of the genome and encompasses the pericentromeric regions of all chromosomes. Because of the presence 
of transcriptional regulatory elements and transcriptional activity in these sequences, TCAST satellite DNAs 
also have been proposed to be modulators of gene expression within euchromatin. Here, we analyze the 
distribution of TCAST homologous repeats in T. castaneum euchromatin and study their association with 
genes as well as their potential gene regulatory role. We identified 68 arrays composed of TCAST-like 
elements distributed on all chromosomes. Based on sequence characteristics the arrays were composed of 
two types of TCAST-like elements. The first type consists of TCAST satellite-like elements in the form of 
partial monomers or tandemly arranged monomers, up to tetramers, whereas the second type consists of 
TCAST-like elements embedded with a complex unit that resembles a DNA transposon. TCAST-like ele- 
ments were also found in the 5' untranslated region (UTR) of the CR1-3_TCa retrotransposon, and therefore 
retrotransposition may have contributed to their dispersion throughout the genome. No significant differ- 
ence in the homogenization of dispersed TCAST-like elements was found either at the level of local arrays 
or chromosomes nor among different chromosomes. Of 68 TCAST-like elements, 29 were located within 
introns, with the remaining elements flanked by genes within a 262 to 404,270 nt range. TCAST-like 
elements are statistically overrepresented near genes with immunoglobulin-like domains attesting to their 
nonrandom distribution and a possible gene regulatory role. 
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Based on the hypothesis of Britten and Davidson (1971), repetitive 
elements can be a source of regulatory sequences and act to distribute 
regulatory elements throughout the genome. In particular, mobile 
transposable elements (TEs) are predicted to be a source of noncoding 
material that allows for the emergence of genetic novelty and influ- 
ences evolution of gene regulatory networks (Feschotte 2008). Re- 
cently it has been shown that at least 5.5% of conserved noncoding 
elements unique to mammals originate from mobile elements and are 
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preferentially located close to genes involved in development and 
transcription regulation (Lowe et al 2007). The complete sequence 
conservation, wide evolutionary distribution, and presence of func- 
tional elements such as promoters and transcription factor binding 
sites within some satellite DNA sequences has led to the assumption 
that in addition to participating in centromere formation, they might 
also act as ds-regulatory elements of gene expression (Ugarkovic 
2005). To perform potential regulatory functions, satellite DNA ele- 
ments are predicted to be preferentially distributed in euchromatic 
portion of the genomes in the vicinity of genes. Whole-genome se- 
quencing projects enable the presence and distribution of satellite 
DNA repeats in the euchromatic portion of the genome to be de- 
termined. The analysis of satellite DNA-like elements dispersed within 
euchromatin, and their comparison with homologous elements pres- 
ent within heterochromatin, also may reveal insights into the origin of 
satellite DNAs and their subsequent evolution (Kuhn et al. 2012). 

Satellite DNAs are major building elements of pericentromeric 
and centromeric heterochromatin in many eukaryotic species, and in 
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certain species they account for the majority of genomic DNA, as in 
beetles from the coleopteran family Tenebrionidae (Ugarkovic and 
Plohl 2002). In the red flour beetle Tribolium castaneum, pericentro- 
meric heterochromatin comprises approximately 40% of the genome, 
and TCAST satellite DNA has previously been characterized as the 
major satellite that encompasses centromeric as well as pericentro- 
meric regions of all 20 chromosomes (Ugarkovic et al. 1996). TCAST 
satellite is composed of two subfamilies, Tcastla and Tcastlb, which 
together comprise 35% of the whole genome. Tcastla and Tcastlb 
have an average homology of 79% and are a similar size at 362 bp and 
377 bp, respectively, but they are characterized by a divergent, sub- 
family specific region of approximately 100 bp (Feliciello et al. 2011). 
The genome sequencing project of T. castaneum has recently been 
completed (Richards et al. 2008). Sequencing involved the euchro- 
matic portion of the genome, with >20% of the genome, correspond- 
ing to heterochromatic regions, excluded due to technical difficulties. 

In this article, we searched for the presence of TCAST satellite- 
homologous elements within the assembled T. castaneum genome by 
using a comprehensive computational analysis. By searching the se- 
quenced T. castaneum genome, we found 68 TCAST satellite DNA 
arrays within the euchromatin of all chromosomes. They were map- 
ped to 5' or 3' ends, as well as within introns, of more than 100 
protein- coding genes. Based on sequence characteristics, dispersed 
TCAST-like elements were classified into two groups. The first group 
includes partial TCAST satellite monomers or short arrays of tan- 
demly arranged monomers up to tetramers. The second group con- 
tains TCAST-like element embedded within complex repeat units that 
contain two hallmarks of DNA transposons, terminal inverted repeats 
and target-size duplications. The evolutionary relationship and possi- 
ble modes of dispersion of the two types of dispersed TCAST-like 
sequences are discussed. In addition, we examined the sequence di- 
vergence, phylogenetic relationship, and chromosomal distribution of 
the elements. Annotation, characterization, and classification of genes 
within the region of TCAST-like elements are reported, with the 
preferential localization of TCAST-like elements near specific groups 
of genes identified. Our results demonstrate for the first time, the 
enrichment of satellite DNA-like elements in the vicinity of genes with 
irnmunoglobulin-like domains and suggest their possible gene-regulatory 
role. 

MATERIALS AND METHODS 

BLASTN version 2.2.22+ was used to screen the NCBI refseq_ge- 
nomic database of T. castaneum. All scaffolds that have not been 
mapped to linkage groups were also screened. The program was op- 
timized to search for highly similar sequences (megablast) to the query 
sequence [TCAST consensus sequence (Ugarkovic et al. 1996)]. Genes 
flanking TCAST-homologous elements were found automatically by 
NCBI blast. Sequences corresponding to hits, as well as their flanking 
regions, were analyzed by dot plot (http://www.vivo.colostate.edu/ 
molkit/dnadot/), using standard parameters (window size 9, mismatch 
limit 0), or more relaxed conditions (window size 1 1, mismatch limit 1), 
to determine the exact start and end site of specific TCAST-like 
elements. The TCAST transposon-like elements were analyzed in 
detail for the presence of hallmarks such as terminal inverted repeats 
(TIRs) and target-site duplications with the aid of the Gene Jockey 
sequence analysis program (for Apple Macintosh). Secondary struc- 
tures were determined using the default parameters of the MFOLD 
program available online [http://mfold.rna.albany.edu/?q=mfold (Zuker 
2003)]. AT content was analyzed using BioEdit Sequence Alignment 
Editor (Hall 1999). Repbase, a reference database of eukaryotic repetitive 
DNA, was screened using WU-BLAST (Kohany et al. 2006). 



Sequence alignment was performed using MUSCLE algorithm 
(Edgar 2004) combined with manual adjustment. All sequences were 
included in the alignment, with the exception of the ones that did not 
at least partially overlap with other sequences. Gblocks was used to 
eliminate poorly aligned positions and divergent regions of the align- 
ments (Talavera and Castresana 2007). Alignments (original fasta 
files) are available upon request. jModelTest 0.1.1 software (Posada 
2008) was used to infer best-fit models of DNA evolution — TPM3uf+G 
for transposon-like and A type elements and TPMluf for B type 
elements. Maximum likelihood (ML) trees were estimated with the 
PhyML 3.0 software (Guindon and Gascuel 2003) using best-fit 
models. Markov chain Monte Carlo Bayesian searches were per- 
formed in MrBayes v. 3.1.2. (Huelsenbeck and Ronquist 2001) under 
the best-fit models (two simultaneous runs, each with four chains; 3 
x 10 6 generations; sampling frequency one in every 100 generations; 
majority rule consensus trees constructed based on trees sampled 
after burn-in). Branch support was evaluated by bootstrap analysis 
(1000 replicates) in ML and by posterior probabilities in Bayesian 
analyses. Pairwise sequence diversity (uncorrected P) was calculated 
using the MEGA 5.05 software (Tamura et al. 2011). 

T. castaneum gene homologs in Drosophila melanogaster were 
searched using the OrthoDB Phylogenomic database. Each gene has 
OrthoDB identificator, with Uniprot data linked to OrthoDB (Water- 
house et al. 2011). To find sets of biological annotations that fre- 
quently appear together and are significantly enriched in a set of 
genes located near TCAST-like elements, program GeneCodis 2.0 
available online (http://genecodis.dacya.ucm.es/) was used. GeneCodis 
generates statistical rank scores for single annotations and their com- 
binations. To find all the possible combinations of annotations, Gen- 
eCodis uses the apriori algorithm introduced by Agrawal et al. 1993. 
Once the annotations were extracted, a statistical analysis based on the 
hypergeometric distribution or the x 2 test of independence was exe- 
cuted to calculate the statistical significance (P values) for each in- 
dividual annotation or co-annotations. 

Two-tailed hypergeometric test with Bonferroni correction (alpha = 
0.025) was used to analyze the distribution of TCAST-like elements 
among T. castaneum chromosomes. In each chromosome the fre- 
quency of TCAST-like elements was compared with the frequency in 
the complete sample and the significance of deviations was calculated. 

RESULTS 

Identification of dispersed TCAST-like elements 

Using the consensus sequence of TCAST satellite DNA (Ugarkovic 
et al. 1996) as a query sequence, we screened the NCBI refseq_ 
genomic database of T. castaneum with the alignment program 
BLASTN version 2.2.22+. The program was optimized to search for 
highly similar sequences (megablast) and blast hits on the query se- 
quence were analyzed individually. Alignments were mapped regard- 
ing start and end site, chromosome number, and total length. When 
the distance between two alignments on the same chromosome was 
short, the genomic sequence was further analyzed by dot plot to 
identify any potential continuity between the two alignments. Only 
genomic sequences with at least 140 nt (40% of TCAST monomer 
length) of continuous sequence and >80% identity to the TCAST 
consensus sequence were considered for further analysis. The total 
number of dispersed TCAST-like elements was 68, with 36 elements 
flanked by genes at both 5' and 3' ends, 3 elements flanked by a single 
gene either at 5' or 3' end (sequences no. 36, 39, 50), and the 29 
elements positioned within introns (Table 1). Except 68 TCAST-like 
elements associated with genes, no other dispersed TCAST-like 
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elements were found within the assembled T. castaneum genome. 
Analysis of scaffolds that have not been mapped to linkage groups 
revealed the presence of an additional 41 TCAST-like elements, but 
because they were not mapped to T. castaneum genome and could 
possibly derive from heterochromatin, we did not consider them for 
further analysis. 

There were only three cases in which two different TCAST-like 
elements were associated with the same gene: gene D6X2C4 contains 
TCAST-like sequences no. 6 and 13 within introns, gene D6X2U7 is 
flanked at 5' and 3' end by sequences no. 5 and 7, respectively, 
whereas gene D6WB29 is located at 3' end of the sequence no. 53 
and has sequence no. 52 within an intron. All other TCAST-like 
elements were positioned near or within different genes. Thus in total, 
there were 101 genes found in the vicinity of TCAST-like elements. 
Characteristics of the genes associated with TCAST-like elements, 
including gene identity number, gene name and chromosomal loca- 
tion, position relative to the associated TCAST-like element, and dis- 
tances between TCAST-like elements and genes, are shown in Table 1 
Distances between TCAST-like elements and genes range from 262 nt 
(gene positioned at 3' site of the sequence no. 36), to a maximal 
distance of 404,270 nt (gene positioned at 5' site of the sequence 
no. 5). 

Characteristics of TCAST-like elements 

TCAST satellite-like elements: Sequence analysis of the 68 TCAST- 
like elements identified within the vicinity of genes enabled their 
classification into two groups. The first group contains partial TCAST 
satellite monomers or tandemly arranged elements, either complete or 
partial dimers, trimers, or tetramers (Table 1). The minimal size of 
satellite repeat was 203 nt (0.6 of complete TCAST monomer; se- 
quence no. 15), whereas the maximal size was 1440 nt (four complete 
TCAST monomers; sequence no. 43; Table 1). In many sequences, 
two subtypes of TCAST satellite monomers were mutually inter- 
spersed: Tcastla and Tcastlb. Tcastlb corresponds to the TCAST 
satellite consensus that was used as a query sequence (Ugarkovic et al. 
1996), and Tcastla corresponds to the TCAST subfamily described in 
Feliciello et al. 2011. Tcastla and Tcastlb have an average homology 
of 79% and are of similar sizes at 362 bp and 377 bp respectively, but 
are characterized by a divergent, subfamily specific region of approx- 
imately 100 bp (Feliciello et al. 2011). There were 34 TCAST satellite- 
like elements found within or in the region of 53 genes. Lengths of 
TCAST satellite-like elements (Table 1), their exact start and end sites 
within genomic sequence and composition (supporting information, 
Table SI) are provided. 

To see whether there is any clustering of sequences of TCAST 
satellite-like elements due to the difference in the homogenization at 
the level of local array, chromosome, or among different chromo- 
somes, sequence alignment and phylogenetic analysis were performed. 
Tcastla and Tcastlb subunits were extracted from TCAST satellite- 
like sequences and analyzed separately. Alignment was performed on 
24 Tcastla subunits, ranging in size from 136 and 377 bp (File SI). 
The average pairwise distances between Tcastla subunits of TCAST 
satellite -like sequences was 5.8%. Alignment adjustment using Gblocks, 
which eliminates poorly aligned positions and divergent regions, 
resulted in few changes; therefore, the original, unadjusted align- 
ment was used for the construction of phylogenetic trees. Because 
the sequences differ in lengths and comprise regions of divergent 
variability, methods that take into account specific models of DNA 
evolution were considered as the most suitable for the construction 
of phylogenetic trees, ML and Bayesian (Markov chain Monte 



Carlo). The ML tree showed weak resolution with no significant 
support for clustering of sequences derived from the same satellite-like 
array or from the same chromosome. Similarly, the Bayesian tree dem- 
onstrated no significant sequence clustering (Figure 1A). 

Alignment of 28 Tcastlb subunits, ranging from 159 bp to 363 bp 
(File S2), was also not significandy affected by adjustment with 
Gblocks; therefore, the unadjusted alignment was used for the con- 
struction of phylogenetic trees (Figure IB). The average pairwise di- 
vergence between Tcastlb subunits, of TCAST satellite-like sequences, 
was 6.7%. With the ML phylogenetic tree, four groups composed of 
two or three sequences, were resolved by relatively low bootstrap 
values. However, the majority of Tcastlb subunit sequences remained 
unresolved. There was no clustering of subunits derived from the 
same array or the same chromosome (Figure IB). Bayesian tree anal- 
ysis produced one significantly supported cluster composed of 10 
sequences derived from 7 chromosomes (Figure IB). 

TCAST transposon-like elements: The second group of TCAST-like 
repeats is represented by a complex element that contains an almost 
complete TCAST (or Tcastlb) monomer, and a TCAST monomer 
segment of approximately 121 bp in an inverted orientation. These 
two TCAST segments are separated by a nonsatellite sequence of 
approximately 306 bp. Both TCAST segments are part of TIRs that are 
approximately 269 bp long (Figure 2). As a result of the long TIRs, 
these elements are likely to form stable hairpin secondary structures 
and therefore resemble transposons. The nonsatellite part of sequence, 
common for all TCAST transposon-like elements, is unique in that it 
does not exhibit significant homology to any other sequence within 
the T. castaneum genome. There were 34 TCAST transposon-like 
elements found within or in the vicinity of 50 genes. Their lengths 
(Table 1) and exact start and end sites within genomic sequence 
(Table SI) are provided. Sequence analysis of TCAST transposon-like 
elements determined that 13 of them were > 1000 bp, with a maximal 
size of 1181 bp (Table 1). The remaining TCAST transposon-like 
elements were shorter, with a minimal size of 314 bp (sequence no. 
27), and usually lacking part of, or one or both, TIRs. Conserved TIRs 
are necessary for transposition, and if they are absent, truncated, or 
mutated so that the transposase cannot interact with the transposon 
sequence, the transposon cannot be mobilized and therefore repre- 
sents a molecular fossil of a once active transposon (Capy et al. 1998). 
Despite mutations and partial truncations of TIRs within the TCAST 
transposon-like elements, and likely because of the length of the TIRs, 
most of the elements still preserve a stable secondary structure and 
could potentially remain mobile. 

Some TCAST transposon-like elements >1000 bp have a 3 -bp 
duplication at the site of insertion in the form of ACT. One TCAST 
transposon-like element (sequence no. 39) is inserted into another 
repetitive DNA, indicated as Tcast2, which had been previously iden- 
tified bioinformatically (Wang et al. 2008). Sequence analysis of this 
transposon-like element confirms the continuity of Tcast2 from the 
duplication site "ACT." Typically, the size of target-site duplication is 
a hallmark of different superfamilies of eukaryotic DNA transposons, 
with mariner/Tel, the only superfamily whose members are charac- 
terized by either 2- or 3-bp target-site duplication (Capy et al. 1998; 
Kapitonov and Jurka 2003; Feschotte and Pritham 2007). There are 
three open reading frames (ORFs) within TCAST transposon-like 
sequences, but the resulting putative proteins are very short and do 
not share similarity with any other proteins (Figure 2). The elements 
therefore do not code for transposases and are considered nonauton- 
omous. Using the whole TCAST transposon-like elements as a query 
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sequence, we searched the T. castaneum Gen Bank database for "full- 
sized" homologous elements that could potentially code for transpo- 
sases and be considered autonomous. The search identified an element, 
named TR 1.9, with a 925-bp sequence inserted within a unique se- 
quence of the TCAST transposon-like elements (Figure 2). This 
925-bp sequence contains an ORF of 206 amino acids and a con- 
served domain belonging to the Transposase 1 superfamily, which 
also includes the mariner transposase. DNA transposons of the 
mariner/Tel superfamily Mariner- l_TCa and Mariner-2_TCa, were 
identified within the T. castaneum genome (Jurka 2009a, 2009b). 
Using BLASTP and the translated sequence from the 925 bp ORF 
as a query sequence, we identified hits with a partial homology to 
a Mariner-2_TCa transposase and to a mariner-like element trans- 
posase present in two other insects, the beetle Agrilus planipennis 
(emerald ash borer) and Chrysoperla plorabunda (green lacewing; 
Neuroptera), but not to Mariner- 1_ TCa transposase. 

To test whether there is any chromosome-specific sequence 
clustering of TCAST transposon-like sequences that could suggest 
difference in homogenization within chromosome and among differ- 
ent chromosomes, the alignment and subsequent phylogenetic analysis 
of TCAST transposon-like sequences was performed. Because TCAST 
transposon-like elements differ significantly in size (314—1181 nt), the 
alignment and phylogenetic analyses was performed on 25 elements 
that mutually overlap in their sequences, whereas the other nine 
TCAST transposon-like elements were excluded from the analysis 
due to the very low overlapping with other elements. Alignment was 
additionally adjusted using Gblocks (File S3). The average pairwise 
divergence among TCAST transposon-like sequences was 12.7%. 
ML and Bayesian methods gave similar tree topologies (Figure 1C). 
The ML tree showed very weak resolution of TCAST transposon-like 
sequences and a general absence of subgroups with specific sequence 
characteristics (Figure 1C). Only two clusters were formed whereas, 
using the Bayesian tree, we identified three well-supported groups; two 
of them were as for ML tree (Figure 1C). 

Distribution of TCAST-like elements 
on T. castaneum chromosomes 

TCAST-like elements found in the vicinity of genes were distributed 
on all 10 T. castaneum chromosomes (Table 1). Positions of consti- 
tutive heterochromatin and euchromatin were assigned on the haploid 
set of T. castaneum chromosomes, based on C-banding data (Stuart 
and Mocelin 1995) and Tribolium castaneum 3.0 Assembly data (Fig- 
ure 3). Within euchromatic segments, the position of each TCAST- 
like element is specifically indicated (Figure 3) based on the position 
within the genomic sequence (Table SI). TCAST-like elements were 
dispersed on both arms of chromosomes 3, 5, 9, and 7, whereas on 
other chromosomes they were located on a single arm (Figure 3). The 
number of TCAST-like elements ranged from 2 on chromosome 1 (X) 
to 17 on chromosomes 3 and 9. To detect whether TCAST-like ele- 
ments were distributed randomly among the T. castaneum chromo- 
somes or whether there was a significant over or underrepresentation 
of the elements on some chromosomes we performed hypergeometric 
distribution analysis test. The analysis revealed no statistically signif- 
icant deviation in the number of TCAST-like elements among the 
chromosomes (Figure SI), pointing to their random distribution. 

To determine whether there was a target preference for the 
insertion of TCAST-like elements, for example high AT content or 
another sequence characteristic, we analyzed the AT content within 
100 bp of the flanking regions for each TCAST -like element, from 
both 5' and 3' sites (Figure S2 and Figure S3). The average AT content 
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Figure 1 Bayesian/ML phylogenetic trees of: (A) TCAST satellite-like elements (subunits Tcastla), (B) TCAST satellite-like elements (subunits 
Tcastl b), and (C) TCAST transposon-like elements. Sequence numbers correspond to those in Table 1 . When a particular sequence is composed 
of few subrepeats (e.g., Tcastla or Tcastl b), numbers indicating subrepeats are added (e.g., 43_1, 43_2, 43_3). Numbers in brackets indicate 
chromosomes on which the corresponding sequences are located. Numbers on branches indicate Bayesian posterior probabilities/ML bootstrap 
support (above 0.5/50%, respectively). 



of the flanking regions for both TCAST satellite-like elements and 
TCAST transposon-like elements did not differ significantly from 
the average AT content of the whole T. castaneum genome or from 
the AT content of randomly selected intergenic regions and 
introns. Thus, this finding suggests that with regard to AT con- 
tent, there is no target preference for the insertion of TCAST-like 
elements. Furthermore, alignment and comparison of all flanking 
sequences of TCAST-like elements did not identify any common 
sequence motifs. 

Genes in the vicinity of TCAST-like elements 

Uniprot gene numbers were used as identifiers of genes located in the 
vicinity of TCAST-like elements (gene names shown in Table 1). 
Uniprot gene numbers for homologous genes found in Drosophila 
melanogaster are also indicated (Table 1). Detailed description of 
the genes, including molecular function of their protein products, 
biological processes in which these proteins are involved, and their 
cellular localization (cellular component), are shown (Table SI). Each 
identified gene is assigned to a particular TCAST-like element within 
its vicinity, and the precise position of TCAST-like elements in geno- 
mic sequence (start and end site) is indicated (Table SI). Functional 
analysis revealed that 17 of 101 genes correspond to putative unchar- 
acterized proteins, whereas the remaining genes are involved in dif- 
ferent molecular functions and diverse biological processes. Among 
the proteins, a proportion is characterized by ATP binding activity (13 
proteins) and involvement in protein phosphorylation and /or signal 
transduction (9 proteins; Table SI). 

To determine whether TCAST-like elements are distributed 
randomly relative to genes or whether they are overrepresented near 
specific groups of genes, we used GeneCodis 2.0 to provide a statistical 
representation of the genes associated with TCAST-like elements. 
Because many genes are still not annotated in T. castaneum and 
furthermore T. castaneum genomic data are not included in Gene- 
Codis, we used gene numbers for orthologous genes from D. mela- 



nogaster for the analysis and compared them with the whole set 
of 14,869 genes annotated in D. melanogaster. Genecodis analysis 
revealed that TCAST-like elements are located near nine genes char- 
acterized as members of the immunoglobulin protein superfamily. 
Because there are only 134 immunoglobulin-like genes present 
within the total set of D. melanogaster genes, random distribution 
of TCAST-like elements would result in their occurrence near ap- 
proximately a single immunoglobulin-like gene. The presence of 
TCAST-like elements in the vicinity of nine immunoglobulin-like 
genes therefore represents a statistically significant overrepresenta- 
tion (0.00000427). All nine genes exhibit structural features of im- 
munoglobulin-like, immunoglobulin subtype 1 and immunoglobulin 
subtype 2 proteins and are associated with the following TCAST 
transposon-like elements: 25 at the 3'end, 28 and 39 at the 5' end, 
32 and 40 within introns, and TCAST satellite-like elements: 8 at the 
3' end, 19 and 62 at the 5' end, and 41 within intron (Table 1). A 
minimal distance between TCAST-like element and immunoglobulin- 
like gene was 7165 bp and a maximal 173,881 bp (Table 1). Molecular 
function of most of immunoglobulin-like genes is unknown, and they 
are involved in different biological processes such as cell adhesion, 
protein phosphorylation, and axon guidance (Table SI). Although all 
nine genes belong to immunoglobulin superfamily, they did not ex- 
hibit sequence similarity, which could suggest role of duplication in 
their evolution and spreading. The position of TCAST-like elements 
relative to the genes also was not consistent with the possibility that 
TCAST-like elements duplicated along with the immunoglobulin- like 
genes. 

Overrepresentation of TCAST-like elements was also found near 
genes that exhibit ATP -binding activity and axon guidance properties 
but with a marginal significance (0.0183374 and 0.00865139). For 
the rest of genes, no significant overrepresentation of TCAST-like 
elements was detected. Thus, enrichment of TCAST-like elements in 
the vicinity of immunoglobulin-like genes potentially implicates a 
role of TCAST-like elements in the regulation of these genes. 
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Figure 2 Organization of TCAST 
elements within T. castaneum 
genome in the form of TCAST 
transposon-like element, tan- 
dem arrays, and CR1-3_TCa 
retrotransposon. Regions corre- 
sponding to TCAST element 
»" * are shown in red. TCAST trans- 
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inverted repeats (arrows) unique 
nonsatellite sequence (green), 

5'UTR target-site duplication in the 

form of "ACT," and the insertion 
point of 925-bp sequence found 

within TR 1.9, element and coding for the putative transposase are shown. Three short ORFs within TCAST transposon-like element are also 
indicated. Within nonlong terminal repeat retrotransposon CR1-3_TCa regions corresponding to 5'UTR and to two ORFs are indicated. 
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DISCUSSION 

TEs are classified in several dozen families based on transposition 
mechanisms and different dynamics properties (Hua-Van et al. 2005). 
Active TEs encode the enzymes necessary for their transposition, 
either to move between nonhomologous regions in the genome or 
to copy themselves to other positions. In many cases, TEs do not 
produce their own enzymes but are able to use those from functional 
copies or even from other TEs families. Defective and inactive TEs 
often are amplified in regions of low recombination such as hetero- 
chromatin and may form tandemly repeated satellite DNAs. The 
origin of satellite DNA array from transposon-like elements is 
reported for many insects such as Drosophila melanogaster (Agudo 
et al. 1999), Drosophila guanche (Miller et al. 2000), and the beetle 
Misolampus goudoti (Pons 2004) whereas the retroviral-like features 
were first observed in the satellite DNA from rodents of the genus 
Ctenomys (Rossi et al. 1993). 

Transposons can be inserted into other repetitive sequences such 
as satellite DNAs, as has been observed for the mariner-like element 
and MITE element, both inserted into satellite DNA of the ant Messor 
bouvieri (Palomeque et al. 2006). Searching for repetitive elements 
homologous to the TCAST repeat within Repbase (http://www. 
girinst.org/repbase/) revealed that 5' UTR of nonlong terminal re- 
peat retrotransposon CRl-3_TCa (Jurka 2009c) shares a high sim- 
ilarity of 83% with a 444-bp long TCAST sequence composed of 1.2 



tandem monomers (Figure 1). Other CR1 subfamilies identified 
within T. castaneum such as CR1-1_ TCa, CRl-2_TCa, and CR1- 
4_TCa, published in Repbase, do not share similarity to CR1-3 and 
do not contain TCAST similar sequence. We propose that CR1-3 was 
inserted within TCAST satellite array and through recombination 
has acquired a part of TCAST sequence. Newly acquired TCAST 
element could act as a promoter because TCAST satellite DNA has 
an internal promoter for RNA Pol II (Pezer and Ugarkovic 2012) 
and becomes a new functional 5' UTR. Subsequent retrotransposi- 
tion of CRl-3_TCa could explain the dispersion of TCAST within 
the euchromatin (Figure 4). Three CRl-3_TCa elements with 
TCAST in the 5'UTR were identified within scaffolds that have 
not been mapped to linkage groups. However, truncated fragments 
with partial homology to CRl-3_TCa retrotransposon can be map- 
ped within T. castaneum genome, some of them in the vicinity of 
TCAST elements. Such arrangement also indicates the role of CR1- 
3_TCa in the spreading of TCAST elements. There is also a possibility 
that TCAST satellite DNA originates from CR1-3 retrotransposon 
which was, after inactivation, amplified within the heterochromatin 
region. In the case of TCAST transposon-like elements, part of the 
satellite sequence is incorporated within TIRs which are character- 
istic for DNA transposons. The presence of target-site duplications 
at the sites of insertions of some TCAST transposon-like elements 
also indicates transposition as a mode of spreading of TCAST 
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Figure 3 Distribution of TCAST-like elements on T. cas- 
taneum chromosomes. The karyotype representing the 
haploid set of T. castaneum chromosomes, and posi- 
tions of constitutive heterochromatin (dark) and euchro- 
matin (white) are depicted based on C-banding data 
(Stuart and Mocelin 1995) and T. castaneum 3.0 assem- 
bly (http://www.beetlebase.org). TCAST transposon-like 
elements (blue) and TCAST satellite-like elements (red) 
are shown. Two TCAST-like elements are represented 
as separate lines if they are at least 100 kb distant from 
each other. 
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Figure 4 Models of spreading of TCAT-like elements 
based on (A) retrotransposition of CR-3_TCa element. 
CR1-3_TCa was inserted within TCAST satellite array 
and through recombination has acquired a part of 
TCAST sequence, which could act as a promoter and 
become a new functional 5'UTR. Subsequent retrotrans- 
position of CR1-3_TCa could explain the dispersion of 
TCAST within the euchromatin. (B) Rolling circle replica- 
tion of TCAST satellite DNA sequences excised from 
their heterochromatin loci via intrastrand recombina- 
tion, followed by reintegration into different genome 
locations by homologous recombination. 



elements. Parts of satellite DNA elements can be found within some 
transposons, such as pDv transposon (Evgen'ev et al. 1982; Zelentsova 
et al. 1986) whose long direct terminal repeats show significant se- 
quence similarity to the pvB370 satellite DNA, located in the centro- 
meric heterochromatin of a number of species of the Drosophila virilis 
group (Heikkinen et al. 1995). The presence of short stretches of 
PisTR-A satellite DNA sequences within 3' UTR of Ogre retrotrans- 
posons dispersed in the pea (Pisum sativum) genome was reported 
(Macas et al. 2009). Furthermore, the mobilization of subtelomeric 
repeats upon excision of the transposable P element from tandemly 
repeated subtelomeric sequences has been observed (Thompson- 
Stewart et al. 1994). 

Incorporation of part of a TCAST satellite DNA sequence into 
a (retro)transposable element, and its subsequent mobilization and 
spreading by (retro)transposition, may explain the distribution of 
TCAST element in the vicinity of genes within euchromatin. Satellite 
DNA sequences are prone to undergo recurrent repeat copy number 
expansion and contraction in divergent lineages as well as among 
populations of the same species (Bosco et al. 2007). This amplification 
appears to be random and does not correlate with phylogeny of the 
species (Pons et al. 2004; Lee et al. 2005; Bulazel et al. 2007). Ampli- 
fication of a satellite sequence is reported to occur as a result of un- 
equal crossing over or duplicative transposition (Smith 1976; Ma and 
Jackson 2006). The discovery of human extrachromosomal elements 
originating from satellite DNA arrays in cultured human cells and 
different plant species indicates the possible existence of additional 
amplification mechanisms based on rolling-circle replication (Assum 
et al. 1993; Navratilova et al. 2008). It has been proposed that satellite 
sequences excised from their chromosomal loci via intrastrand recom- 
bination could be amplified in this way, followed by reintegration of 
tandem arrays into the genome (Feliciello et al. 2006). Moreover, it is 
possible that such a mechanism affected TCAST satellite DNA, and 
that extrachromosamal circles of TCAST were reintegrated into 



different genome locations by homologous recombination based 
on short stretches of sequence similarity between TCAST satellite 
and target genomic sequence (Figure 4). Integrated TCAST sequen- 
ces are mainly composed of interspersed elements belonging to two 
major subfamilies, Tcastla and Tcastlb, which is a prevalent type of 
organization in pericentromeric heterochromatin (Feliciello et al. 
2011). This finding indicates that the origin of dispersed euchro- 
matic TCAST elements may be duplication of heterochromatin 
copies. 

The distribution of TCAST-like elements relative to protein coding 
genes revealed no specific preference for insertions within introns or 
at 5' or 3' ends of genes. TCAST-like elements are distributed on all 
chromosomes with no significant deviation in the number among the 
chromosomes, and phylogenetic analysis did not detect any significant 
sequence clustering of TCAST-like elements derived from the same 
chromosome. Dispersed TCAST satellite-like elements produce tan- 
dem arrays up to tetramers, but repeats from the same array do not 
reveal any significant clustering on phylogenetic trees. This finding 
indicates there is no significant difference in the homogenization of 
TCAST satellite-like repeats at the level of local arrays or chromosome 
or among different chromosomes. The average pair-wise sequence 
divergence (6% for dispersed TCAST satellite-like repeats) is greater 
than the usual divergence of satellite elements located in heterochro- 
matin of tenebrionid beetles [approximately 2% (Ugarkovic et al. 
1996)]. This difference in homogeneity between repeats located in 
heterochromatin and euchromatin may be explained by a lower rate 
of gene conversion affecting dispersed satellite-like elements or by 
a specific mechanism of DNA repair acting on satellite DNA (Feliciello 
et al. 2006). TCAST transposon-like elements dispersed among the 
genes within euchromatin have an even greater average sequence 
divergence (approximately 12%) and also exhibit no significant 
chromosome-specific sequence clustering, indicating a similar rate of 
homogenization within and among the chromosomes. Relatively high 
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sequence divergence of TCAST transposon-like elements and the sig- 
nificant truncation of the majority of them, indicates that the trans- 
position of these elements did not occur very recently and that these 
elements could be considered as molecular fossils of the functional 
TCAST transposon-like elements. 

G's-regulatory elements, such as promoters or transcription factor 
binding sites, are predicted in some satellite DNAs (Pezer et al 2011). 
Transcription from promoters for RNA Pol II is also characteristic for 
pericentromeric satellite DNAs from the beetles Palorus ratzeburgii 
and Palorus subdepressus (Pezer and Ugarkovic 2008, 2009). Temper- 
ature-sensitive transcription of TCAST satellite DNA from an internal 
RNA Pol II promoter has been demonstrated (Pezer and Ugarkovic 
2012). Based on these findings, it can be proposed that TCAST ele- 
ments located in the vicinity of genes may function as alternative 
promoters, and transcripts derived from them may interfere with 
the expression of neighboring gene. This type of regulation is often 
observed for retrotransposons positioned immediately 5' of protein 
genes (Faulkner et al. 2009). In addition, some tissue-specific gene 
promoters are derived from retrotransposons (Ting et al. 1992; 
Samuelson et al. 1996). Because of rapid evolutionary turnover, satel- 
lite DNA sequences often are restricted to a group of closely related 
species, or in some instances are species specific. This is the case with 
TCAST satellite DNA, which is not even detected in the congeneric 
Tribolium species. If restricted satellite DNAs have regulatory poten- 
tial, then insertion of these elements in vicinity of genes could con- 
tribute to the establishment of lineage-specific or species-specific 
patterns of gene expression. Annotation of genes in proximity to 
TCAST-like elements demonstrated a statistical overrepresentation 
of certain groups of genes, for example, those with immunoglobu- 
lin-like domains. Recently, in the fish Salvelinus fontinalis, a regulatory 
role of a 32-bp satellite repeat, located in an intron of the major 
histocompatibility complex gene (MHIIFj), on MHIIfS gene expression 
was demonstrated (Croisetiere et al. 2010). The level of gene expres- 
sion depends on temperature, as well as the number of satellite 
repeats, and indicates a role for temperature-sensitive satellite DNA 
in gene regulation of the adaptive immune response. Further studies 
are necessary to determine whether TCAST-like elements exhibit a po- 
tential regulatory role on nearby genes. The transcriptional potential 
of satellite DNAs as well as their distribution close to protein-coding 
genes, as shown in this study, provides strong support, that in addition 
to transposons, satellite DNAs represent a rich source for the assembly 
of gene regulatory systems. 
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